Geocoding Textual Documents Through a Hierarchy of Linear Classifiers

نویسندگان

  • Fernando Melo
  • Bruno Martins
چکیده

Most documents can be said to be related to some form of geographic context and, recently, geographic information retrieval has captured the attention of researchers from fields related to text mining and data retrieval, envisioning the support for tasks such as map-based document indexing, retrieval and visualization. In this paper, I empirically evaluate automated techniques, based on a hierarchical representation for the Earth’s surface and leveraging linear classifiers, for assigning geospatial coordinates of latitude and longitude to previously unseen documents, using only the raw text as input evidence. The obtained results were measured with models based on Support Vector Machines, over collections of geo-referenced Wikipedia articles in four different languages, namely English, German, Spanish and Portuguese. The best performing models were based on Support Vector Machines, obtaining state-of-the-art results corresponding to an average prediction error of 86 Kilometers, and a median error of just 8 Kilometers, in the case of the English Wikipedia collection. For the German, Spanish, and Portuguese collections, which are significantly smaller, the same method obtains an average prediction error of 62, 184 and 109 Kilometers, respectively, and a median prediction error of 5, 13, or 21 Kilometers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Mining Methods for Mapping Opinions from Georeferenced Documents

With the growing availability of large volumes of textual information on the Web, text mining techniques have been gaining a growing interest. One specific text mining problem that is increasingly relevant relates to the detection of textual expressions that refer to opinions on certain topics and services. A second text mining problem, which has also been gaining a growing interest, is the ide...

متن کامل

Ontology Evaluation through Text Classification

We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of ontology relations by measuring their classification potential over the textual documents. This data-driven method provides concrete feedback to ontology maintainers and a quantitative estimation of the functional adequ...

متن کامل

A Geo-textual Search Engine Approach Assisting Disaster Recovery, Crisis Management and Early Warning Systems

1 : 1 This work has been funded by the Oberfrankenstiftung This paper presents an approach used in geo-textual search engines for application in security related domains like disaster recovery or early warning systems. Current approaches suffer from search conditions utilizing some combined scheme of textual and geographical search predicates. Standard retrieval engines support only either text...

متن کامل

Exploiting Semantic Annotations and Q-Learning for Constructing an Efficient Hierarchy/Graph Texts Organization

Tremendous growth in the number of textual documents has produced daily requirements for effective development to explore, analyze, and discover knowledge from these textual documents. Conventional text mining and managing systems mainly use the presence or absence of key words to discover and analyze useful information from textual documents. However, simple word counts and frequency distribut...

متن کامل

Classifying Scientific Publications Using Abstract Features

With the exponential increase in the number of documents available online, e.g., news articles, weblogs, scientific documents, effective and efficient classification methods are required in order to deliver the appropriate information to specific users or groups. The performance of document classifiers critically depends, among other things, on the choice of the feature representation. The comm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015